Unsupervised Dimension Reduction of High-Dimensional Data for Cluster Preservation
نویسندگان
چکیده
High-dimensional data is receiving increasing attention in more and more application fields, but the analysis of such data has shown to be difficult due to the “curse of dimensionality”. Dimension reduction methods have emerged as successful tools to overcome the problem of high-dimensionality. However, even if they are designed to preserve the most important properties of the data, they are generally blind to the preservation of structures (e.g. multimodal distributions, clusters). In this paper, we propose a class of dimension reduction strategies, called High-Dimensional Multimodal Embedding (HDME), that aim to find low-dimensio-nal representations of high-dimensional data that preserve cluster information. The difficulty of analysing high-dimen-sional data arises from the fact that, in high-dimensional representation spaces, all pairwise distances between points tend to become equal. To overcome the problem of equidistancy, HDME performs a processing of the distances, consisting of a scaling of the distances between similar data points. Similarity may be estimated based on neighbourhood, cluster or class information. We show that the neigh-bourhood-based variant is a competitive alternative to clustering. After the scaling, the points are embedded in a low-dimensional space using a distance-based embedding method. Experiments show that HDME is effective both in terms of retrieval and clustering when compared to known state-of-the-art methods operating in high-dimensional spaces. The code and data are available from http://viper.unige.ch/doku.php/viper_private:HDME.
منابع مشابه
High-Dimensional Unsupervised Active Learning Method
In this work, a hierarchical ensemble of projected clustering algorithm for high-dimensional data is proposed. The basic concept of the algorithm is based on the active learning method (ALM) which is a fuzzy learning scheme, inspired by some behavioral features of human brain functionality. High-dimensional unsupervised active learning method (HUALM) is a clustering algorithm which blurs the da...
متن کاملبهبود مدل تفکیککننده منیفلدهای غیرخطی بهمنظور بازشناسی چهره با یک تصویر از هر فرد
Manifold learning is a dimension reduction method for extracting nonlinear structures of high-dimensional data. Many methods have been introduced for this purpose. Most of these methods usually extract a global manifold for data. However, in many real-world problems, there is not only one global manifold, but also additional information about the objects is shared by a large number of manifolds...
متن کاملUnsupervised Kernel Dimension Reduction
We apply the framework of kernel dimension reduction, originally designed for supervised problems, to unsupervised dimensionality reduction. In this framework, kernel-based measures of independence are used to derive low-dimensional representations that maximally capture information in covariates in order to predict responses. We extend this idea and develop similarly motivated measures for uns...
متن کاملMultilayer bootstrap network for unsupervised speaker recognition
We apply multilayer bootstrap network (MBN), a recent proposed unsupervised learning method, to unsupervised speaker recognition. The proposed method first extracts supervectors from an unsupervised universal background model, then reduces the dimension of the high-dimensional supervectors by multilayer bootstrap network, and finally conducts unsupervised speaker recognition by clustering the l...
متن کاملIntegrated constraint based clustering algorithm for high dimensional data
Dimension selection, dimension weighting and data assignment are three circular dependent essential tasks for high dimensional data clustering and each such task is challenging. To meet the challenge of high dimensional data clustering, constraints have been employed in several previous works. However, these constraint based algorithms use constraints to help accomplish only one of the three es...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008